Support tool callbacks in MCP sampling#2998
Conversation
|
/review |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Two medium-severity findings in the newly-added sampling-with-tools code. The core stream-aggregation logic, capability handshake, content building, and limits enforcement are all well-structured and correctly tested.
|
Sorry @EronWright, our project requires all the commit to be signed and verified. Could you sign them? Thanks! |
|
I moved it back to draft @EronWright the time you rewrite the history with signed commits. Sorry for the issue |
b1fad0b to
89a93e6
Compare
|
@aheritier sorry for delay, all commits are now signed/verified. |
Adds a parallel SamplingWithToolsHandler alongside the existing SamplingHandler so MCP servers can include a tools array in sampling/createMessage requests. The host drives its model with those tools and returns any tool_use blocks as ToolUseContent; the server remains responsible for executing the tool and continuing the loop in a follow-up sampling request. The initialize handshake now advertises sampling.tools capability, and the MCP toolset selects the appropriate go-sdk handler (basic vs. with-tools) based on which handler is registered. Signed-off-by: Eron Wright <eronwright@gmail.com>
Mounts an in-process gomcp.NewServer on an httptest server via StreamableHTTPHandler. Its one tool, ask_with_calculator, runs a sampling loop: sends sampling/createMessage with a calculator tool, gets a tool_use back from the host LLM, "executes" the calculator, sends a follow-up sampling request carrying the tool_result, and returns the final text. The Gemini side is recorded once and replayed on subsequent runs, so the test runs offline in CI. Signed-off-by: Eron Wright <eronwright@gmail.com>
- Reject tool_use blocks under a non-assistant role explicitly rather than silently dropping the message. The MCP spec places tool_use on assistant turns, but a malformed server would previously have its message disappear from the converted chat history with no error. - Document the ordering dependency between SetSampling*Handler and Initialize in both stdio and remote MCP client setups, so future maintainers don't reorder them and silently lose the sampling handler on reconnect. Signed-off-by: Eron Wright <eronwright@gmail.com>
Use errors.New for the static loop-exhausted message and strconv.FormatInt for the calculator result instead of fmt.Errorf / fmt.Sprintf %d. Signed-off-by: Eron Wright <eronwright@gmail.com>
89a93e6 to
8e150a2
Compare
|
/review |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Two medium-severity issues found in the new sampling-with-tools path. Both relate to edge cases in the streaming and reconnect logic.
| if choice.FinishReason != "" { | ||
| providerFinishReason = choice.FinishReason | ||
| } | ||
| if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength { |
There was a problem hiding this comment.
[MEDIUM] Early break on FinishReasonStop may drop tool-call chunks from later SSE events
drainSamplingStreamWithTools breaks out of the stream-read loop as soon as it sees FinishReasonStop or FinishReasonLength:
if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength {
break
}Tool-call deltas that appear in the same chunk as the finish-reason are captured correctly (they're processed in the for _, delta := range choice.Delta.ToolCalls loop a few lines earlier). However, if a provider splits its response so that tool-call argument chunks arrive in subsequent SSE frames after the finish-reason frame, those deltas are silently discarded.
The analogous handleStream in streaming.go guards against this with a continue that skips the early-exit when the current chunk contains only tool-call deltas (no finish reason yet). That pattern is missing here.
Impact: Incomplete or missing tool calls are returned to the MCP server as a ToolUseContent block with truncated arguments, causing the server's tool-execution loop to fail or produce wrong results. The failure is silent — no error is returned.
Suggested fix: Mirror the streaming.go guard: after accumulating tool-call deltas, continue to the next chunk if choice.FinishReason == "", and only break on FinishReasonStop/FinishReasonLength once all pending deltas have been flushed.
| PromptListChangedHandler: promptChanged, | ||
| } | ||
| switch { | ||
| case c.samplingWithToolsHandler != nil: |
There was a problem hiding this comment.
[MEDIUM] Reconnect race: sampling handler may not be registered with the SDK if configureToolsetHandlers hasn't run yet
Initialize (called on both the initial connect and every supervisor-driven reconnect) reads c.samplingWithToolsHandler and c.samplingHandler at call time to decide which SDK-level callback to register:
switch {
case c.samplingWithToolsHandler != nil:
opts.CreateMessageWithToolsHandler = c.handleSamplingWithToolsRequest
case c.samplingHandler != nil:
opts.CreateMessageHandler = c.handleSamplingRequest
}
// if both are nil → no sampling callback registered with the SDK clientThe SDK client is constructed with opts and never updated afterwards. If a reconnect happens before the runtime's configureToolsetHandlers has called SetSamplingWithToolsHandler / SetSamplingHandler, both fields are nil at Initialize time. The resulting SDK session has no CreateMessage* handler at all, so any sampling/createMessage request from the server is rejected with an error for the lifetime of that session.
This is a regression from the previous unconditional registration pattern, where the basic sampling callback was always wired. The comment "callers must invoke … before any reconnect" documents the contract, but it is not enforced by any ordering guarantee in the reconnect path today.
Impact: An MCP server that triggers sampling during or shortly after a reconnect (e.g., a long-running agentic loop that reconnects due to a network hiccup) will receive an error from the sampling handler instead of a model response, breaking its tool-execution loop.
Suggested mitigation: Either (a) guarantee in the supervisor/reconnect path that configureToolsetHandlers is always called before Initialize completes, and add a test that verifies this ordering; or (b) fall back to registering a best-effort handler at Initialize time that reads the handler fields lazily at request time (similar to how handleSamplingWithToolsRequest already reads c.samplingWithToolsHandler under the mutex), so a late SetSamplingWithToolsHandler call is still effective even after reconnect.
Summary
Closes the tool callbacks functional gap in MCP sampling support — a follow-up to #2815, addressing one of the remaining items from #2809.
When an MCP server includes a
toolsarray in asampling/createMessagerequest, the host now drives its model with those tools and returns anytool_useblocks back to the server asToolUseContent. The server remains responsible for executing the tool and continuing the loop in a follow-up sampling request.sequenceDiagram participant H as cagent participant S as MCP Server participant L as LLM activate H H->>+S: tools/call {name, arguments} note over S: needs LLM inference S->>+H: sampling/createMessage<br/>{messages, tools: [...]} H->>+L: chat completion L-->>-H: ToolUseContent<br/>stopReason: "toolUse" H-->>-S: CreateMessageResult<br/>{tool_use, stopReason: "toolUse"} note over S: executes tool locally S->>+H: sampling/createMessage<br/>{messages + tool_use + tool_result, tools: [...]} H->>+L: chat completion L-->>-H: TextContent<br/>stopReason: "endTurn" H-->>-S: CreateMessageResult<br/>{text, stopReason: "endTurn"} S-->>-H: tool result deactivate HWhat's new
SamplingWithToolsHandlertype andSampleableWithToolsinterface — additive, parallel to the existingSamplingHandler/Sampleable. No breaking changes to the basic sampling path merged in feat(mcp): add sampling/createMessage support #2815.Initialize, exactly one of the SDK's mutually exclusiveClientOptions.CreateMessage*fields is populated — prefer with-tools when registered, fall back to basic.sampling.toolsso servers know the host can receive tool-enabled requests.pkg/runtime/sampling.go):text,image/audio,tool_use→ assistantToolCalls,tool_result→MessageRoleToolrows (parallel tool_results expand to multiple chat.Message rows).[]*mcp.Tool→[]tools.Toolwith a no-op handler (the server, not the host, executes).model.CreateChatCompletionStream, aggregates streamed tool calls.ContentwithTextContent+ToolUseContentblocks;stopReason: "toolUse"when tool calls are present.maxSamplingTools=64,maxSamplingToolCalls=32.e2e/sampling_test.go): mounts an in-processgomcp.NewServeron anhttptestserver viaStreamableHTTPHandler. The server exposes one tool (ask_with_calculator) whose handler drives a real sampling-with-tools loop against the connecting cagent. The Gemini side is recorded once and replayed on subsequent runs, so the test runs offline in CI.Out of scope (separate gaps from #2809)
Test plan